TEXT SENTENCE COMPARING APPARATUS 

The present disclosure relates to the subject matter 
contained in Japanese Patent Application No .w2 0 02-269193 filed 
5 on September 13, 2002, Japanese Patent Application 

No. 2002-071273 filed on March 15, 2003, and Japanese Patent 
Application No. 2003-071274 filed on March 15, 2003, which are 
incorporated herein by reference in its entirety. 

10 BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention is relates to an apparatus/method 
for comparing text sentences with each other to check differences 
in semantic contents by using, for example, a computer. More 
15 specifically, the present invention relates to an 

apparatus/method for comparing text sentences in high precision 
and in real time. 

2. Description of the Related Art 

Since IT technology has made rapid progress, especially, 
20 high-speed Internet mobile technology has made rapid progress, 
very large amounts of information may be utilized by anybody, 
anywhere, and anytime. Conversely, a so-called 
"inf ormation-f lood phenomenon" may occur, so that users can 
hardly acquire such information which is truly required for 
25 these users. To realize such a world that proper information 
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can be continuously acquired even under any conditions of users, 
the information which owns true values for these users must 
be extracted/reconstructed from such an information flood. 

In this case, techniques for comparing semantic contents 
5 of documents with each other, techniques for classifying text 
documents in accordance with the semantic contents, and 
techniques related to understandings of information searching 
intentions of users may constitute important aspects. Also, 
in order to realize the comparisons of the semantic contents 
10 of the documents, the classifications of the text documents, 
and the understandings of the information searching intentions 
of the users, similarity judgments as to meaning by utilizing 
natural language processing technologies are necessarily 
required . 

15 In this field, several sorts of technical ideas for judging 

similaritybetween text sentences have been proposed . However, 
the major technical ideas among them utilize local information 
of sentences, for example, word information appeared in 
sentences and dependency relation information between words, 

20 and therefore, can be hardly applied as evaluation bases of 
semantic contents of text sentences, namely cannot realize such 
a goal that the semantic contents of the documents are compared 
with each other, and the information searching intentions of 
the users are understood. 

25 Very recently, such a method has been proposed. That 
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is, text sentences are semantically analyzed, the analyzed text 
sentences are represented in the form of graphs, and then, 
experimental similarity are measured based upon the graphic 
representations. However, the proposed similarity has been 
5 measured not by considering structural changes, and also there 
is no clear definition in a relationship between the definitions 
of the similarity and the differences in the semantic contents 
of the text sentences. 

As examples of the conventional techniques related to 
10 the present invention, the below-mentioned prior art has been 
proposed. 

[Non-Patent Publication 1] 

"Japanese Semantic Analysis System SAGE using EDR" 
written by Harada and Mizuno, "Japanese Society for Artificial 
15 Intelligence" in 2001, 16(1), pages 85 to 93. 
[Non-Patent Publication 2] 

"A Quantitative Representation of Features based on Words 
and Documents Co-occurences" written by Shoko Aizawa, "Natural 
Language Processing" in March, 2000, 136-4. 
20 [Non-Patent Publication 3] 

"Self-Organizing Semantic Map of Japanese Nouns" written 
by Q. Ma, "Information Processing Society of Japan", volume 
42, No. 10, in 2001. 

As previously described in the above prior art, the 
25 conventional systems contain such problems that the performance 
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of comparing the similarity of the semantic contents between 
the text sentences is still inadequate. Also, the 
conventionally proposed similarity can be hardly linked to the 
explanations as to the differences in the semantic contents 
5 between the text sentences. 

SUMMARY OF THE INVENTION 
The present invention has been made to solve the 
above-explained problems. It is an object of the invention 

10 to provide an apparatus and a method, which can compare 

differences in semantic contents between text sentences in high 
precision and in real time. Furthermore, specifically, in the 
text sentence comparing apparatus/method according to the 
present invention, for instance, in order to realize comparisons 

15 between semantic contents of documents, classifications of text 
documents based on semantic contents, and understandings of 
information searching intentions by users, a distance, which 
can measure differences in semantic contents between text 
sentences is defined in a mathematical formalism. Also, this 

20 distance can be obtained in real time. 

In order to achieve the above-described object, in a text 
sentence comparing apparatus according to the present invention, 
comparing operations between text sentences are carried out 
in accordance with the below-mentioned manner. 

25 In other words, a tree representing section represents 
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text sentences to be compared with each other as rooted trees 
on graph theory. A information applying section applies 
information produced based on the text sentences to respective 
vertexes of the trees represented by the tree representing 
5 section and also applies case information, which is dependency 
relation information between words, to respective edges. A 
tree distance defining section defines a distance between the 
trees, which is based on a correspondence relationship among 
the vertexes and among edges . A tree distance acquiring section 

10 acquires the distance between the trees defined by the tree 
distance defining section. A tree distance applying section 
applies the distance between the trees to a distance indicative 
of a difference (or similarity) between the text sentences. 
A text sentence distance acquiring section acquires a distance 

15 between the text sentences to be compared with each other based 
on the application by the tree distance applying section. 

Therefore, as to two text sentences to be compared with 
each other, the entire constructions and the meaning of the 
text sentences are represented as rooted trees on the graph 

20 theory. Then, a semantic difference between these two text 
sentences can be considered based on a distance between these 
two text sentences, which is calculated by applying thereto 
a distance between the two trees, so that comparing operation 
between the text sentences can be carried out in high precision 

25 and in real time. 
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In this case, in accordance with the invention, since 
distances between trees on the graph theory are applied to 
comparing operations of text sentences, not only word 
information and case information contained in these text 
5 sentences, but also constructions of these text sentences are 
taken into consideration. The invention applies the word 
information to the vertexes of the tree and also applies the 
case information to the edges of the trees. 

Also, distances between text sentences may be classified 

10 into two sorts of distances by judging as to either trees, which 
are rooted and ordered, or trees, which are rooted andnot ordered, 
are employed. The two sorts of distances can be arbitrarily 
selected based on calculation speeds and comparison precision 
in application field. 

15 It should be understood that such a tree, which is rooted 

and ordered on the graph theory, is referred to as an "RO tree 
(Rooted and Ordered Tree) " , whereas such a tree, which is rooted 
andnot ordered, is referred to as an X> R tree (Rooted and Unordered 
Tree)" in this specification. 

20 When an RO tree is compared with an R tree, generally 

speaking, the RO tree can be calculated in a simple manner as 
compared with the R tree, whereas meaning comparing precision 
of the R tree is higher than that of the RO tree. 

Also, in accordance with the present invention, various 

25 sorts of information may be employed as the word information. 
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For example, the word information may include word attribute 
information. This word attribute information, for example, 
may include part-of-speech information, which is acquired by 
way of a morphological analysis. Also, in the case of a verb, 
5 information as to a conjugation may be used. 

Also, a sort of dependency relation between words 
corresponds to a case. 

Also, the word information and the case information may 
be obtained by semantically analyzing, for example, a text 

10 sentence. Alternatively, the word information and the case 
information (in this case, dependency relation information) 
may be obtained by syntactically analyzing the text sentence 
and analyzing the text sentence in dependency relation. 

Also, as a mapping condition between R trees, for example, 

15 a condition that "the mapping is a one-to-one mapping, 

parent-child relationship (hierarchical relationship) is 
preserved, structures of R trees are preserved, and the mapping 
between vertexes does not intersect with the mapping between 
edges" may be used for a mapping between vertexes and a mapping 

20 between edges. 

Also, as amapping conditionbetween RO trees, for example, 
a condition that "the mapping is a one-to-one mapping, 
parent-child relationship (hierarchical relationship) is 
preserved, brother relationship is preserved, structures of 

25 RO trees are preserved, and the mapping between vertexes does 



7 



not intersect with the mapping between edges" may be used for 
a mapping between vertexes and a mapping between edges. 

Also, when a tree A is mapped to a tree B, for instance, 
a case in which a vertex of the tree A is mapped to a vertex 
of the tree B corresponds to a "substitution of vertex"; a vertex, 
which is located in the tree A and cannot be mapped, corresponds 
to a "deletion of vertex"; and a vertex, which is located in 
the tree B and cannot be mapped corresponds to an "insertion 
of vertex". Also, an edge of the tree A is mapped to an edge 
of the tree B corresponds to a "substitution of edge"; an edge, 
which is located in the tree A and cannot be mapped, corresponds 
to a "deletion of edge"; and an edge, which is located in the 
tree B and cannot be mapped corresponds to an "insertion of 
edge" . 

Also, as adistancebetween trees, for example, theminimum 
value of sum of weight (sum of mapping weight) in a case where 
one tree is mapped to another tree may be employed Further, 
this distance between trees implicitly includes a distance 
between forests. 

Also, as a method of applying numbers to respective 
vertexes and respective edges of either an RO tree or an R tree, 
for example, the following method may be utilized. That is, 
while the numbers are allotted to the respective vertexes and 
respective edges in an increment manner by the way of a 
depth-priority searching operation, distances are calculated 
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in an order from the vertexes having larger numbers. 
Specifically, distances are sequentially calculated from a 
subtree located on the lowest side to a subtree located on the 
upper side by employing a dynamic scheme method. 
5 Also, a label is used in order to store information 

thereinto . 

Furthermore, a structural example of the present 
invention will now be described as follows: 

(1) A semantic content of text sentences comparing 
10 apparatus obtains a distance measuring semantic contents 

between text sentences . The comparing apparatus includes means 
for representing structures and meaning of the entire text 
sentences as RO trees or R trees, means for applying word 
information and dependency relation information between words 

15 (or case information) to vertexes and edges of the RO trees 
or the R trees, respectively, means for defining a distance 
between RO trees or R trees, which is based on correspondence 
relations between the vertexes and between edges, means for 
obtaining the defined distance between the RO trees or the R 

20 trees, means for applying the distance between the RO trees 
or R trees to a distance comparing semantic differences between 
the text sentences, and means for obtaining the distance between 
the text sentences* 

(2) The means for defining the distance between RO trees 
25 or R trees, which is based on the correspondence relations 
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between the vertexes and the edges, includes label allocation 
means for allocating labels to each vertex and each edge of 
the RO trees or R trees on the graph theory, number allocation 
means for allocating number to each vertex and each edge of 
5 the RO trees or R trees, mapping means for performing mapping 
between the RO trees or the R trees, on the basis of the 
correspondence relations between the vertexes and between the 
edges and mapping conditions between the RO trees or the R trees, 
which are based on the correspondence relations between vertexes 

10 and between the edges, mapping means for performing mapping 
between orderedor unordered forests based on the correspondence 
relations between the vertexes and between the edges, mapping 
weight setting means for defining weights of the mappings 
performed by these mapping means, means for defining a distance 

15 between the ordered or unordered forests based on the mapping 
means for performing the mapping between the ordered or unordered 
forests and the mapping weight setting means, and means for 
defining a distance between the RO trees or R trees based on 
the mapping means for performing the mapping between the RO 

20 trees or the R trees and the mapping weight setting means. 

(3) The means for applying the distance between the RO 
trees or R trees to a distance comparing semantic differences 
between the text sentences includes means for making the mapping 
between the words correspond to the mapping between the vertexes 

25 of the RO trees or the R trees, means for making the word mapping 



weights correspond to the vertex mapping weight of the RO trees 
or the R trees, means for making the case mapping weights 
correspond to the edge mapping weights of the RO trees or the 
R trees, means for setting the word mapping weights, and means 
for setting the case mapping weight. 

(4) The means for obtaining the distance between the text 
sentences sets the distance obtained by the means for obtaining 
the distances between either the RO trees or the R trees as 
the distance between the text sentences. 

(5) The means for obtaining the distance between the text 
sentences sets a result obtained by dividing the distance 
obtained by the means for obtaining the distances between the 
RO trees or the R trees by a summation of total numbers of vertexes 
of the RO trees or the R trees. 

(6) The means for setting the mapping weights between 
the words includes means for setting the substitution weights 
between the words stored in the each vertex when two vertexes 
are mapped in the mapping between the RO trees or the R trees, 
means for setting the deletion weights of the words stored in 
each vertex when the vertexes cannot be mapped and are deleted, 
means for setting the insertion weights of the words stored 
in each vertex when the vertexes cannot be mapped and are inserted, 
means for setting relation among the word substitution weights, 
the word deletion weights, and the word insertion weights. 

(7) Themeans for setting the case mapping weights includes 



means for setting the case substitution weights between cases 
stored in each edges when two edges are mapped in the mapping 
between the RO trees or the R trees, means for setting the case 
deletion weight stored in the edges when the edge cannot be 
5 mapped and are deleted, means for setting the case insertion 
weight stored in the edges when the edge cannot be mapped and 
are inserted, means for setting relation among the case 
substitution weights, the case deletion weights, and the case 
insertion weights. 

10 (8) The means for setting the word substitution weight 

includes means for setting the word substitution weight to 0 
when two words are the same word, and means for setting positive 
constant value to the word substitution weight when the two 
words are different. 

15 (9) The means for setting the word substitution weights 

sets the word substitution weights as a distance between two 
words . 

(10) The means for setting the word deletion weight sets 
the word deletion weight as a constant. 
20 (11) The means for setting the word deletion weight sets 

the word deletion weight based upon a part-of-speech of the 
word . 

( 12 ) The means for setting the word insertion weight sets 
the word insertion weight as a constant. 
25 (13) The means for setting the word insertion weight sets 
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the word insertion weight as a constant. 

(14) The means for setting the relation among the word 
substitution weight, the word deletion weight, and the word 
insertion weight establishes a relationship satisfying "the 

5 word deletion weight + the word insertion weight > the word 
substitution weight", 

(15) The means for setting the case substitution weight 
includes means for setting the case substitution weights to 
zero when two cases are identical to each other, and means for 

10 setting the case substitution weights to positive constants 
when two cases are different from each other. 

(16) The means for setting the case substitution weight 
includes means for classifying all of cases into a plurality 
of N categories, means for setting the substitution weight 

15 between the categories of the cases, and means for setting the 
substitution weight between cases as the substitution weights 
between categories to which two cases belong, respectively. 

(17) The means for setting the case deletion weight sets 
the case deletion weight as a constant. 

20 (18) The means for setting the case deletion weight sets 

the case deletion weight based upon a sort of a case. 

(19) The means for setting the case insertion weight sets 
the case insertion weight as a constant. 

(20) The means for setting the case insertion weight sets 
25 the case insertion weight based upon a sort of a case. 
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(21) The means for setting the relation among the case 
substitution weight, the case deletion weight, and the case 
insertion weight establishes such a relation satisfying "the 
case deletion weight + the case insertion weight > the case 

5 substitution weight. 

(22) A semantic content of text sentences comparing 
method obtains a distance measuring semantic contents between 
text sentences. The comparing method includes representing 
structures and meaning of the entire text sentences as RO trees 

10 or R trees, applying word information and dependency relation 
information between words (or case information) to each vertex 
and each edge of the RO trees or the R trees, obtaining the 
defined distance between the RO trees or the R trees based on 
correspondence relations between the vertexes and between the 

15 edges, applying the distance between the RO trees or the R trees 
to a distance comparing semantic differences between the text 
sentences, and obtaining the distance between the text 
sentences . 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a diagram for indicating a structural example 
of an apparatus for comparing semantic content between text 
sentences according an embodiment of the present invention. 

Fig. 2 is a diagram for showing a structural example in 
25 the case that both an apparatus/method for comparing 



14 



implicated-contents between text sentences, according to the 
present invention, are applied to an information terminal 
apparatus . 

Fig. 3 is a diagram for indicating an example of an analysis 
5 result made by a morphological analysis section. 

Fig. 4 is a diagram for representing an example of a. 
representation of a tree structure. 

Fig. 5 is a diagram for indicating an example of a data 
construction of a table (list) as to distances among case 
10 categories . 

Fig. 6 is a diagram for indicating an example of two 
subtrees which are constituted by either RO trees or R trees. 

Fig . 7 is a diagram for indicating an example of two forests 
which are constituted by either RO trees or R trees. 
15 Fig. 8 is a diagram for showing an example of a weighted 

bipartite graph. 

Fig. 9 is a diagram for representing tree structures of 
a Japanese sentence "A" and another Japanese sentence M B" . 

Fig. 10 is a diagram for showing an example of mapping 
20 operation for applying distances between RO trees of the Japanese 
sentence A and the Japanese sentence B. 

Fig. 11 shows various mappings between two trees. 
Fig. 12 shows calculation procedure of the distance 
between RO trees 
25 Fig. 13 shows calculation procedure of the distance 
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between R trees 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
Referring now to drawings, an embodiment of the invention 
5 will be described. 

Fig. 1 shows an embodiment of an apparatus for comparing 
semantic contents of text sentences with each other (text 
sentence comparing apparatus) according to the embodiment of 
the invention . This text sentence comparing apparatus executes 
10 a method of comparing semantic contents of text sentences with 
each other according to the embodiment of the invention. 

The text sentence comparing apparatus shown in this 
drawing includes an external storage apparatus 1, a 
morphological analysis section 2, a semantic analysis section 
15 3, a tree structure conversion section 4, a word-mapping-weight 
calculation section 5, a case-mapping-weight calculation 
section 6, a distance calculation section 7, a semantic content 
comparison section 8, a storage section 9, and a plurality of 
memories 10 to 18. The morphological analysis section 2 
20 extracts morphemes of a text sentence. The semantic analysis 
section 3 analyzes meaning of a text sentence. The tree 
structure conversion section 4 converts an analyzed result of 
the semantic analysis section 3 into either an RO tree or an 
R tree on the graph theory . The word-mapping- weight calculation 
25 section 5 calculates a word substitution weight when two words 
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are substituted, a word deletion weight when a word is deleted, 
and a word insertion weight when a word is inserted. The 
case-mapping-weight calculation section 6 calculates a case 
substitution weight when two cases are substituted, a case 
deletion weight when a case is deleted, and a case insertion 
weight when a case is inserted. The distance calculation 
section 7 calculates a distance between either RO trees or R 
trees. The semantic content comparison section 8 acquires a 
difference in semantic content between text sentences. The 
storage section 9 is constituted by, for example, a memory. 

Also, data of text sentences is stored in the external 
storage apparatus 1. 

The memory 10 and the memory 11 store data of two text 
sentences read from the external storage apparatus 1, 
respectively. The memory 12 and the memory 13 store analysis 
results of the two text sentences made by the morphological 
analysis section 2 respectively . The memory 14 and the memory 
15 store semantic analysis results of the two text sentences 
made by the semantic analysis section 3, respectively. The 
memory 16 and the memory 17 store conversion results made by 
the tree structure conversion section 4 as to the two text 
sentences. The memory 18 stores either a distance between the 
RO trees or a distance between the R trees, which is calculated 
by the distance calculation section 8. 

Alternatively, it should be noted that these memories 



10 to 18 may be combined with each other, or a text sentence 
comparing apparatus may be formed without using these memories 
10 to 18. 

The morphological analysis section 2 extracts both 
5 morphemes and attributes of the two text sentences stored in 
the memory 10 and the memory 11, and then stores the analysis 
results of the respective text sentences into the memory 12 
and the memory 13, respectively. 

The semantic analysis section 3 is input the morphological 
10 analysis results stored in the memory 12 and the memory 13 to, 
analyzes meanings of the text sentences, and then stores analysis 
results of the text sentences into the memory 14 and the memory 
15, respectively. 

The tree construction converting unit 4 converts the 
15 semantic analysis results stored in the memory 14 and the memory 
15, into either RO trees or R trees, and then, stores word 
information (including attributes of words) appeared in the 
text sentences into vertexes of either the converted R0 trees 
or the converted R tree, and also stores relevant case 
20 information appeared in the text sentence into edges of the 
RO trees or the R trees. 

Also, the tree structure conversion section 4 stores the 
converted results as to the text sentences into the memory 16 
and the memory 17, respectively. 
25 The word-mapping-weight calculation section 5 calculates 
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a word substitution weight, a word deletion weight, and a word 
insertion weight, which are required to calculate either a 
distance between RO trees or a distance between R trees, and 
then supplies them to the distance calculation section 7. 
5 The case-mapping-weight calculation section 6 calculates 

a case substitution weight, a case deletion weight, and a case 
insertion weight, which are required to calculate either a 
distance between RO trees or a distance between R trees, and 
then supplies them to the distance calculation section 7. 

10 The distance calculation section 7 calculates a distance 

between either the two RO trees or the two R trees stored in 
the memory 16 and the memory 17, and then, stores the calculated 
results thereof into the memory 18. 

The semantic content comparison section 8 calculates a 

15 distance between the text sentences by using either the distance 
between the RO trees or the distance between R trees stored 
in the memory 18, and then stores the calculated result into 
the storage section 9. 

Next, a construction example of an information terminal 

20 apparatus to which an apparatus and a method for calculating 
a distance used to compare semantic contents between text 
sentences, according to the invention, are applied, as an 
application example . 

Fig. 2 shows a construction example of an apparatus to 

25 which the method for calculating the distance used to compare 
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the semantic contents between the text sentences, according 
to thepresent invention, is applied, as the application example . 

The information terminal apparatus 20 shown in Fig. 2 
includes an external storage apparatus 21, a keyboard 22, a 
display 23, and a processor unit 24. This processor unit 24 
is equipped with a module 25 for obtaining a distance between 
text sentences . 

The external storage apparatus 21 stores thereinto data 
of input text sentences, either a word feature dictionary or 
a thesaurus dictionary, which are used so as to obtain a word 
mapping weight, a weight diet ionary used to obtain a case mapping 
weight, a result of a calculated distance between text sentences, 
software, and the like. This external storage apparatus 21 
functions as a storage space used in a calculation. In this 
case, as to the word feature dictionary, the thesaurus dictionary, 
the weight dictionary, and the like, for example, these 
dictionaries have been previously formed, or existing 
dictionariesmaybe prepared. Also, specifically, the external 
storage apparatus 21 may be constituted by, for instance, a 
hard disk drive. 

The keyboard 22 is an input apparatus used to instruct 
an operation by a user. It should also be noted that another 
input apparatus may be added thereto. 

The display 23 corresponds to an output apparatus for 
displaying thereon a message with respect to the user, data 



or a text sentence, an analysis result, a calculation result 
of adistance, and the like. It should also be noted that another 
output apparatus may be additionally provided. 

The processor unit 24 executes an actual process operation 
5 in accordance with the software or the like stored in the external 
storage apparatus 21. Specifically, this processor unit 24 
may include, for example, a computer system such as a 
microprocessor and a personal computer. Then, the 
morphological analysis section 2, the semantic analysis section 

10 3, the tree structure conversion section 4, the 
word-mapping-weight calculation section 5, the 
case-mapping-weight calculation section 6, the distance 
calculation section 7, and the semantic content comparison 
section 8 may be constructed by the software operated on this 

15 processor unit 24. 

Next, operations of the apparatus for comparing 
differences in semantic contents between text sentences 
according to the embodiment of the present invention will now 
be explained in detail. 

20 The external storage apparatus 1 has stored thereinto 

data of text sentences. The data of the two text sentences 
are read out from the external storage apparatus 1, and then, 
are stored into the memory 10 and the memory 11, respectively. 
The morphological analysis section 2 extracts the morphemes 

25 of the text sentences stored in the memory 10 and the memory 
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11, and then, stores the extracted results into the memory 12 
and the memory 13, respectively. 

In this case, as the morphological analysis tool, 
arbitrary morphological analysis tools which have been 
5 published may be utilized. For instance, the morphological 
analysis tool "ChaSen" may be used, which has been produced 
by Matsumoto Laboratory of Nara Institute of Science and 
Technology. 

Also, Fig. 3 indicates an analysis result of a 
10 morphological analysis with respect to such a sentence "a teacher 
teaches English to students" 

The syntactic-and-semantic analysis section 3 inputs 
thereinto the results of the morphological analysis stored in 
the memory 12 and the memory 13, analyzes sentence structures 
15 of the text sentences, dependency relation (or case information) 
of the text sentences, deep structures of the text sentences, 
and the like, and then, stores the analyzed results into the 
memory 14 and the memory 15, respectively. 

Here, as a syntax analysis tool and a semantic analysis 
20 tool, arbitrary syntax analysis tools and arbitrary semantic 
analysis tools may be utilized, which have been known. For 
example, the method described in the non-patent publication 
1 may be employed (see non-patent publication 1) . 

The tree structure conversion section 4 inputs thereinto 
25 the analysis result stored in the memory 14 and the memory 15, 
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converts the inputted analysis results into tree structures, 
and then, stores the converted tree structures into the memory 
16 and the memory 17, respectively. 

Fig. 4 indicates a tree structure in which the analysis 
5 result of the semantic analysis of the text sentence "a teacher 
teaches English to students" is converted into a form of the 
tree structure. As word information and case information, "a 
teacher" and "SUBJ", "English" and "OBJ", "students" and "OBJ", 
and "teach" and "NULL" are stored in the vertexes, respectively. 

10 In Fig. 4, as the case information, SUB J (subjective case) , 

OBJ (objective case), OBL (oblique case), and NULL (empty) are 
indicated. Alternatively, as the case information, an AD JUNCT 
(adjunct case) may be employed. 

In this embodiment, in order to obtain differences between 

15 a tree T a and a tree T b , consider a mapping set from the tree 
T a to the tree T b , which satisfies a predetermined condition. 
Generally, in a mapping between two different trees, 
substitution, deletion, and/or insertion of vertexes and 
substitution, deletion, and/or insertion of edges occur. For 

2 0 example, in Fig. 10, a vertex "Hanako" and an edge "ADJUNCT" 
of a left tree is deleted. When weights are set with respect 
to the substitution, the deletion, and the insertion, 
differences between two trees can be evaluated using the weights . 
In this embodiment, this evaluation of the differences is 

25 referred to as "a distance between two trees". For example, 
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a mapping M Rmin , which has minimum sum of the weights, is obtained 
from among a mapping set M R satisfying a predetermined condition 
that "the mapping is a one-to-one mapping, parent-child 
relationship (hierarchical relationship) is preserved, a 
structure is preserved, and mapping between vert exes and mapping 
between edges don't intersect with each other", and then, the 
sum of the weights of the mapping M Rmin is defined as the distance 
between R trees. Also, a mapping M RO min/ which has minimum sum 
of the weights, is obtained from among a mapping set M R0 satisfying 
another predetermined condition that "the mapping is a 
one-to-one mapping, parent-child relationship (hierarchical 
relationship) is preserved, right/left relationship between 
brothers is preserved, a structure is preserved, and mapping 
between vertexes and mapping between edges don' t intersect with 
each other", and then, the sum of the weights of the mapping 
M RO min is defined as the distance between RO trees. 

The word-mapping-weight calculation section 5 calculates 
a word substitution weight, a word deletion weight, and a word 
insertion weight in response to request from the distance 
calculation section 7, and then, provides these calculated 
weights to the distance calculation section 7. 

The word substitution weight may be a constant or may 
be set by using a distance between words. In the former case, 
when two words are the same words, the word substitution weight 
is set as zero. Conversely, when two words are not identical 
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to each other, the word substitution weight is set to a positive 
constant. In the latter case, the word-mapping-weight 
calculation section 5 obtains a distance between two words, 
and provides a value of the obtained distance to the distance 
calculation section 7 as the word substitution weight. 

As a method of obtaining a distance between words, 
arbitrary known methods may be utilized. For instance, there 
are a statistical method, a method using a thesaurus dictionary, 
and a method using a neural network . As the statistical method, 
for instance, the distance between the words may be obtained 
by employing the tf-idf method described in the non-patent 
publication 2 (see non-patent publication 2) . As the method 
using the thesaurus dictionary, for example, a length of a 
minimum path between concepts to which two words belong may 
be set as the distance between the words. As the method using 
the neural network, for instance, the method described in the 
non-patent publication 3 (see non-patent publication 3) may 
be employed. Also, other known methods may be used. 

The word deletion weight maybe a constant . Alternatively, 
the word deletion weight may be set in accordance with 
part-of-speech information of a word. In the latter case, a 
weight is allotted to a part-of-speech of a word, and the word 
deletion weight is a product of a part-of-speech weight by a 
constant. As a part-of-speech weight setting operation, for 
instance, it is preferable to apply a large weight to a part 
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of speech having an important role. As one example, it may 
be possible to set that a weight of a verb is the largest weight, 
and weights of part-of-speech becomes smaller in order of an 
adjective verb, a noun, an adverb, and an adjective. 
5 Alternatively, part-of-speech weights may be set based upon 
other orders. 

The word insertion weight may be a constant. 
Alternatively, the word insertion weight may be set based upon 
part-of-speech information of a word. In the latter case, a 

10 weight is allotted to a part-of-speech of a word. The word 
insertion weight is a product of a part-of-speech weight by 
a constant. As a part-of-speech weight settingmethod, a method 
similar to the part-of-speech weight setting method, which has 
been described with respect to the word deletion weight, may 

15 be used. Alternatively, the part-of-speech weight may be set 
based upon other different methods. 

The case-mapping-weight calculation section 6 calculates 
a case substitution weight, a case deletion weight and a case 
insertion weight in response to a request from the distance 

20 calculation section 7. Then, the case-mapping-weight 

calculation section 6 provides these calculated weights to the 
vertex-mapping-weight calculation section 7. 

The case substitution weight may be a constant. 
Alternatively, the case substitution weight may be set using 

25 a distance between cases. In the former case, when two cases 
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are the same case, the case substitution weight is set to zero. 

Conversely, when two cases are not identical to each other, 

the case substitution weight is set to a positive constant. 

In the latter case, the case-mapping-weight calculation section 
5 6 obtains a distance between two cases and provides a value 

of the obtained distance to the distance calculation section 

7 as the case substitution weight. 

In this case, one example of a method for obtaining the 

distance between cases will be given. 
10 First, all of cases are classified into several categories 

depending upon contents thereof. It should be noted that number 

of elements in the categories is not less than 1. 

Also, a table of distances among the case categories as 

shown in Fig. 5 is prepared. In the table shown in Fig. 5, 
15 with respect to all of combinations of a plurality (namely, 

V pieces) of case categories, the distances (i.e., distance 

value 11 to distance value mm) among the case categories are 

set . 

Next, the case categories to which two cases belong, 
20 respectively, are obtained which are specified based upon two 
pieces of given case information. Also, a distance value 
between the two acquired case categories is obtained. Thus, 
this obtained distance value may be set as a distance between 
the two cases. 

25 It should also be noted that another method may be employed 
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as a method of obtaining the distance between cases. 

The case deletion weight maybe a constant . Alternatively, 
the case deletion weight may be set in accordance with a sort 
of a case. In the latter case, a weight is allotted to a case 
and a case deletion weight is a product of the case weight by 
a constant. As setting of the case weights, for example, it 
may be possible to set that, for instance, a weight of SUBJ 
is the largest weight . The weights may become smaller in order 
of OBJ, OBL, and ADJUNCT. Alternatively, the case weights may 
be set based upon other orders. 

The case insertion weight may be a constant. 
Alternatively, the case insertion weight may be set in accordance 
with a sort of a case. In the latter case, a weight is allotted 
to a case and the case insertion weight is set as a product 
of the case weight by a constant . As setting of the case weights , 
for example, it may be possible to use a setting method similar 
to the method of setting the case weight as described with respect 
to the case deletion weight. Also, the case insertion weight 
may be set based upon other different setting methods. 

The distance calculation section 7 calculates a distance 
between either RO trees or R trees stored in the memory 16 and 
the memory 17 and then, stores the calculation result into the 
memory 18. If the word substitution weight, the word deletion 
weight, the word insertion weight , the case substitution weight, 
the case deletion weight, and the case insertion weight is 
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required to calculate the distance between trees, the distance 
calculation section 7 outputs the word information and the case 
information of the two text sentences to be compared together 
with a calculation request to the word-mapping-weight 
5 calculation section 5 and the case-mapping-weight calculation 
section 6. Upon receiving the calculation request, the 
word-mapping-calculation section 5 and the 

case-mapping-calculation section 6 conduct the calculation and 
output required information to the distance calculation section 
10 7. 

Next, definition of a distance between RO trees based 
on correspondence relation between vertexes and edges and a 
distance between R trees based on correspondence relation 
between vertexes and edges and a method for obtaining each 

15 distance will be described. 

First, in order to describe definition of a distance 
between trees and the method for obtaining the distance between 
trees, relative symbols are defined as follows: 

Number of edges in a path from the root to vertex x is 

20 defined as the depth of x, denoted by dep(x). The depth of 
the root is 0. A vertex having the depth of dep (x) -hi is called 
a child of x, the set of children of vertexes x is denoted by 
Ch(x). A vertex, which does not have a child, is called a leaf . 
A vertex having the depth dep(x)-l is called a parent of x, 

25 denoted by pa (x) . When expressing as pa2 (x) = pa (pa (x) ) , a 
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component of An(x) = U i=1/ ..., dep (, c ){pa' I (x)}_is called ancestor of x. 
An edge between a vertex x and its parent is denoted by x . When 
x is the root, x is empty. Here, assuming that a vertex x is 
not a parent of the vertex x. A set of ancestor An(x) of the 
5 edge x can be defined similarly. A set of edges from an edge 
x to an ancestor u of the edge x is denoted by path(x, u) . 
Labels are allotted to vertexes and edges of a tree, respectively. 
Labels of a vertex x and an edge x are denoted by lab(x) and 
lajb(x), respectively. In the invention, a label of a vertex 

10 indicates word information and a label of an edge indicates 
case information. When arbitrary two vertexes x x and x 2 
(x, * x 2 ) satisfy x, £ An{x 2 ) and x 2 £ An(Xj), we say x x and x 2 
are separated, and denoted by sep(x,,x 2 ). Similarly, when 
arbitrary two edges x, and x 2 (x, ^ x 2 ) satisfy 

15 Xj £ An(x 2 ) and x 2 g AnfxJ , we also say Xj and x 2 are separated, 
and denoted by sep(x,, x 2 ) 

A subtree, which is of a tree T a and has a vertex x as 
the root, is denoted by T a (x) . 

A set of vertexes of the subtree T a (x) is denoted by V a (x) . 

20 A set of edges of the subtree T a (x) is denoted by E a (x) . 

A part, which consists of the subtree T a (x) and an edge 
x, is denoted by T a (x), and T a (x) is also called a subtree. 

A set of vertexes of the subtree T a (x) is denoted by V a (x) , 
and a set of edges of the subtree f a (x) is denoted by E a (x) . In 

25 this case, V a (x) = V a (x) and E a (x\j {x} = V a (x) are established. 
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Apart, which consists of subtrees ? a (x,), f a (x 2 ),..., f fl (xj, 
is called a forest, and the forest is denoted by F a (x) . 

In a mapping from a tree T a to a tree T b , a vertex x or 
an edge x of the tree T a may be moved in the tree T b with no 
changes. The above translations will be defined. A 
translation in which a vertex x of T a is changed into a vertex 
y of T b is denoted by (x, y) and is called that the vertex x maps 
to the vertex y. At this time, a label of the vertex may be 
changed. A translation in which an edge x of T a is changed to 
an edge y of T b is denoted by (x, y) and is called that the edge 
x maps to the edge y . At this time, a label of the edge may 
be changed. The above-described mapping M is a set of (x, y) 
and (x, y) • 

With respect to subtrees T a (x) , T b (y) , T a (x), and f b (y), 
J v (x), I v (y), J E (x)r and I E (y) are defined as follows. 

J v (x) = j) G M, i e V a (x)} 

J v (y) = {i|(i, j) g M, j € V b (y)} 

J E (x) = j) € M,i € E a (x)} 

J £ (y) = {i|ti, j) elf/je ^(y)} 
Here, <J v (x) indicates the set of image vertexes of subtree T a (x) . 
I v (y) indicates the set of inverse image vertexes of subtree 
T b (y) . Similarly, J E (x) indicates the set of image edges of 
subtree T a (x) are mapped. I E (y) indicates the set of inverse 
image edges of subtree T b (y) . 

Assuming that the smallest subtree including J v (x) is 
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T b (y x ). Note that y x is the root of the subtree T b (y x ) . When 
J v (x) = O , y x cannot be decided. Similarly, x y is defined. 
T a (x y ) is the smallest subtree including I v (y) . We can say 
T b (y x ) is the smallest subtree of T b that includes all of mapping 
vertexes of the subtree T a (x), and T a (x y ) is the smallest subtree 
of T a that includes all of the mapping vertexes of the subtree 
T b (y) . With regard to edges, the smallest subtree including 
J B (x) is denoted by T b (y x ) . Note that y x is the root of the 
subtree T b (y x ) . When J E (x) = O , y x cannot be decided. 
Similarly, x y is defined. T a (x y ) is the smallest subtree 
including X E (y). 

In the beginning, a description will now be made of the 
method for calculating the distance between the RO trees based 
on the correspondence relationship between the vertexes and 
the edges. 

Assuming that a predetermined condition, which a mapping 
M between RO trees should satisfy, includes the following 
conditions. Also, with regard to a mapping M between ordered 
forests F a (x) and F b (y), m(J {(x, y)} satisfies the following 
mapping conditions (al) to (all) . 

For any (x,, y,) e M, (x 2 , y 2 ) e M , 

(al) x, = x 2 iff y, = y 2 

(a2) x, g An(x 2 )iff y, e An(y 2 ) 

(a3) "x, is located on left of x 2 " iff " y, is located 
on left of y 2 " 
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For any mapping vertexe x } and any vertex x 2 in T a , 
(a4) if y xl and y x2 can be decided, 

x, g An(x 2 )iff y xl g An(y x2 ) and 

sep(x H x 2 )iff sep(y xU y x2 ) 
5 For any mapping vertex y, and any vertex y 2 in T^, , 

(a5) if x yl and x y2 can be decided, _ 

x yl g An(x y2 )i.f_f y, g An{y 2 ) and 

sep(x yl , x y2 )iff sep(y,, y 2 ) 
For any (x,, yj g M, (x 2 , y 2 ) g M, 
10 (a6) x, = x 2 iff y, = y 2 

(a7) x, g An(x 2 )ijfjf y, g An(y 2 ) 

(a8) w x, is located on left of x 2 " iff " y } is located 

on left of y 2 " 
For any mapping edge x x and any edge x 2 in T a , 
15 (a9) if y X] and y X2 can be decided, 

x, g An(x 2 )iff y si g An(y x2 ) and 
sepfo, x 2 )iff sep(y~ xl , y~ 2 ) 
For any mapping edge y x and any edge y 2 in T b , 
(alO) if x yl and x 92 can be decided, 
2 0 x yl € An(x y2 )ijff yi e An(y 2 ) and 

sep(x yl , x y2 )irr sep(y,, y 2 ) 
(all) if (x, y) ^ M , 

J v (x) g T^y), J E (x) g fjy), J v (y) g T a (x) and 
J £ (y) g T a (x) 

25 (al) and (a6) are one-to-one mapping conditions. (a2) 
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and (a7) guarantees the mapping to preserve the relation between 
ancestors and descendants. A mapping shown in Fig. 11A is 
one-to-many mapping and therefore, is prohibited. Also, since 
amapping shown in Fig . 11B does not preserve the relat ionbetween 
5 ancestors and descendants, this mapping does not meet (a2) . 
(a3) and (a8) guarantee the mapping to preserve the relation 
between left and right. With regard to a mapping shown in Fig. 
11C, although a vertex 5 is located on right of a vertex 3, 
the image 5' is located on left of the image 6' . This mapping 
10 is a mapping not preserving the relation between left and right . 
(a4), (a5), (a9), and (alO) guarantee the mapping images of 
two subtrees, which are separated fromeach other , and/or inverse 
images of the two subtrees are separated from each other. A 
mapping shown in Fig. 11D meets (al) to (a3) . However, for 
15 vertexes 1' and 4' , both the smallest subtrees including inverse 
images of subtrees T b (V) and T b (4') are subtree T a (2) and the 
vertex 1' is an ancestor of the vertex 4' . Therefore, this 
mapping does not meet the mapping condition (a5) . A mapping 
shown inFig. HE also does not meet (a5) . The mapping condition 
20 (all) prohibits mapping between vertexes and mapping between 
edges from intersecting with each other . The mapping condition 
(all) guarantees that the smallest subtree including an image 
of amapping vertex in a subtree isnot separated from the smallest 
subtree including an image of a mapping edge in the same subtree . 
25 For example, with regard to a mapping shown in Fig . 11F, amapping 
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from a vertex 5 to a vertex 6' intersect with a mapping from 

an edge 5 to an edge 7. Therefore, this mapping is prohibited. 
With regard to a mapping M = |(2,4'), (3,5'), (5,6'), (5,3')} shown in Fig. 
11G, regard less of mapping an edge 5 to an edge 3' ( ^5,3'j e M ) , 
5 the smallest subtree T b (6 f ) including images of all mapping 
vertexes in T a (5) is separated from the smallest subtree 5^(3') 
including images of all mapping edges . Therefore, this mapping 
does not meet the mapping condition (all) . A mapping 
M = {(2,4'), (3,5'), (5,6'), (3,5'Jt (5,6')} shown in Fig. 11H satisfies the 
10 above mapping conditions (al) to (all). 

Fig. 6A to Fig. 6D show four modes of two subtrees, which 
are RO trees. 

With respect to a distance between subtrees, which are 
RO trees, and a distance between ordered forests, the following 
15 definitions are made: 

A distance between two subtrees T a (x) and ^(y), which are 
the RO trees, shown in Fig. 6A is expressed by D(T a (x), T b (y)) . 

A distance between two subtrees T a (x) and f b (y) , which are 
the RO trees, shown in Fig. 6B is expressed by D(f a (x), T b (y^) . 
20 A distance between two subtrees T a (x) and T b (y) , which are 

the RO trees, shown in Fig. 6C is expressed by d(t s (x), T b (yf) . 

A distance between two subtrees f a (x) and T b (y) , which are 
the RO trees, shown in Fig. 6D is expressed by £>(f a (x), ^(y)) . 
Among all mappings from T a (x) to T b (y) satisfying the 
25 mapping conditions (al) to (all) , the minimum value of weight, 
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which is required for the mapping, is defined as the distance 
D(T a (x), T b (y)) . Among all mappings from T a (x) to T b (y) satisfying 
the mapping conditions (al) to (all), the minimum value of 
weight, which is required for the mapping, is defined as the 
5 distance £?(r a (x), T b (y)) . Among all mappings from T a (x) to T b (y) 
satisfying the mapping conditions (al) to (all), the minimum 
value of weight, which is required for the mapping, is defined 
as the distance o(f a (x), T b (y)) . Among all mappings from T a (x) to 
T (y) satisfying the mapping conditions (al) to (all), the 
10 minimum value of weight, which is required for the mapping, 
is defined as the distance d(t 3 (x), T b (y)) . Similarly, among all 
mappings from forest F a (x) to forest F b (y) satisfying the 
mapping conditions (al) to (all), the minimum value of weight, 
which is required for the mapping, is defined as the distance 

15 D(F a (x), F b (y)). 

The method for obtaining the distance between RO trees 
first allots numbers to vertexes and edges from the root of 
the RO tree by the way of depth priority searching. Compute 
the distance between the smallest subtrees (consists of one 
20 vertex) firstly, and then using the above results, compute the 
distance between larger subtrees, and finally, we can get the 
distance between the two RO trees. 

We can obtain the distances between the two RO trees shown 
in Fig. 6A to Fig. 6D using the following formulae 1 to 4 . In 
25 this case, symbol "A-B" shown in the formulae 1 to 4 indicate 
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a function for removing all components of a set B from a set 
A. 

Also, it is assumed that a distance between ordered forests 
£)(j? a ( x ), F b (y)) , distances between all subtrees 

D(T a (x)< T b {Yj)\ D fc( x )> t(y,)\ ofc(x> T b { yj )), D(T a (x), T„(yJ have 
already been obtained. 

As can be seen from the tree structure expression method 
and the conversion method of the text sentence described above, 
vertexes x and y indicate words (including attributes of words) 
appeared in the text sentence. Edges x and y indicate 
dependency relation information (case information) between 
words of the text sentence. Also, a function 5{x, y) represents 
a vertex substitution weight and can be obtained using the word 
substitution weight. Also, a function g(y) represents an 
insertion weight of a vertex y and can be obtained using the 
word insertion weight. Also, a function r(x) represents a 
deletion weight of a vertex x and can be obtained using the 
word deletion weight. 

Edges are defined as follows. 

A function S(x, y) represents an edge substitution weight 
and can be obtained using the case substitution weight. A 
function g(y) represents an insertion weight of an edge y and 
can be obtained using the case insertion weight. A function 
r(x) represents a deletion weight of an edge x and can be 
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obtained using the case deletion weight. 

The vertex substitution weight, the vertex insertion 
weight, the vertex deletion weight, the case substitution weight , 
the case insertion weight, and the case deletion weight are 
not negative value and satisfy S(x, y) < r(x) + g(y) and 
8{x, y) < r(x) + g(y) . 



(1) 



li^x\fiy))= mm 



5(x y)-S{x, j)+ Z^4 £(4 

+2«Wl M^')-^))+2>(*)l * 4Mv)-^))j 



10 



(2) 



i^(4^))= r 



(3) 
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... (4) 

Fig. 7 indicates, for example, two ordered forests. A 
distance between these forests £>(F a (x), F b (y)) can be calculated 
using the following formula 5. In this case, symbol |A| 
indicates total number of components of a set A. 

(5-1) boundary condition (l < i < \ch(x\ 1 < j < |Ch(y)|) 

<il(0,0) = 0 ; 

(5-2) calculation of dl(i, j) (l < i < |Ch(x)j, 1 < j < \Ch(y}) 

4/,y)=mii]4i,y-i)+2>(A)^^} + 2i(^l * 



(5-3) 



d(f„(4f,,(>0)= rfi(|cM*)|,|c^)|). 



(5) 



With regard to the formula 1, when the vertex x is a leaf 
15 (Ch(x) = Null: empty set), apparently, since a second term of 
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the right hand of the formula 1 need not be calculated, the 
distance D(T a (x), T b (y)) can be calculated using formula 6. 

Also, in the formula 1, when the vertex y is a leaf 
(Ch(y) = Null: empty set), apparently, since a third term of 
5 the right hand of the formula 1 need not be calculated, the 
distance D(T a (x), T h (y)) can be calculated using formula 7. 

... (6) 

10 •••(7) 

Similarly, in the formula 2, when the vertex x is a leaf 
(Ch(x) = Null: empty set), apparently, since a second term of 
the right hand of the formula 2 need not be calculated, the 
distance can be calculated using formula 8 . 
15 Also, in the formula 2, when the vertex y is a leaf 

(Ch(y) = Null : empty set), apparently, since a third term of 
the right hand of the formula 2 need not be calculated, the 
distance can be calculated using formula 9. 
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(8) 



(9) 



Similarly, in the formula 3, when the vertex x is a leaf 
(Ch(x) = Null : empty set) , apparently, since a second term of 
the right hand of the formula 3 need not be calculated, the 
distance can be calculated using formula 10. 

Also, in the formula 3, when the vertex y is a leaf 
(Ch(y) = Null: empty set), apparently, since a third term of 
the right hand of the formula 3 need not be calculated, the 
distance can be calculated using formula 11. 



(10) 



15 



c(f44^(>))-inin 
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• • • (11) 

Similarly, in the formula 4, when the vertex x is a leaf 
(Ch(x) = Null : empty set) , apparently, since a second term of 
the right hand of the formula 4 need not be calculated, the 
distance can be calculated using formula 12. 

Also, in the formula 4, when the vertex y is a leaf 
(Ch(y) = Null: empty set), apparently, since a third term of 
the right hand of the formula 4 need not be calculated, the 
distance can be calculated using formula 13. 



(12) 



• ■ ■ (13) 

Next, a description will now be made on the method for 
calculating a distance between R trees based on the 
correspondence relation between vertexes and edges . It should 
be noted that the expression of symbols are substantially same 
as that in the method for calculating the distance between RO 
trees described above. Therefore, description concerning the 
expression of symbols is omitted. 

Assuming that a predetermined condition, which a mapping 
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M between R trees should satisfy, includes the following 
conditions . Also, with regard to a mapping Mbetween unordered 
forests F a (x) and F b (y) , m\J {(x, y)} satisfies the following 
mapping conditions (bl) to (b9) . 
5 For any (x,, y,) g M , (x 2 , y 2 ) e Af , 

(bl) x, = x 2 iff" y, = y 2 

(b2) x, g An(x 2 )iff y, g An(y 2 ) 

For any mapping vertexe x, and any vertex x 2 in T s , 
(b3) if y xl and y x2 can be decided, 
10 x, e An(x 2 )iffy xl g An(y x2 ) and 

sep(x,, x 2 )±ff sep(y xl , y x2 ) 
For any mapping vertex y, and any vertex y 2 in , 
(b4) if x y) and x y2 can be decided, _ 
x yl e Afl(x y2 )iff y, e An(y 2 ) and 
15 s ep(x yl , x y2 ) i ff s ep(y, , y 2 ) 

For any (x,, y,) e M, (x 2 , y 2 ) e M , 
(b5) x, = x 2 iff y, = y 2 
(b6) x, e An(x 2 )iffy } e An(y 2 ) 

For any mapping edge x, and any edge x 2 in T a , 
20 (b7) if y x , and y x2 can be decided, 

x, e An(x 2 )iff y S) e An(y X2 ) and 
sep(x„ x 2 ) iff sep(y xi , y x2 ) 
For any mapping edge y, and any edge y 2 in T b , 
(b8) if x y , and x- 2 can be decided, 
25 x yl G An(x, 2 )iff y, g An(y 2 ) and 
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sep(x 9U x 92 )iff sep(y„ y 2 ) 
(b9) if (x, y) e M, 

J v (x) e T b (y), J E (x) e T b (y), I v (y) e T a (x) and 
i E (y) g f a (x) 

5 (bl) and (b5) are one-to-one mapping conditions. (b2) 

and (b6) guarantees the mapping to preserve the relation between 
ancestors anddescendants . (b3) , (b4), (b7 ), and (b8 ) guarantee 
the mapping images of two subtrees, which are separated from 
each other, and/or inverse images of the two subtrees are 

10 separated from each other. The mapping condition (b9) means, 
if x maps to y, then subtree T a (x) must map to T b (y) , and T b (y) 
must be mapped from T a (x) . 

Among all mappings from T a (x) to T b (y) satisfying the 
mapping conditions (bl) to (b9) , the minimum value of weight, 

15 which is required for the mapping, is defined as the distance 
D ( T a( x )' T b(y)) - Among all mappings from T a (x) to T b (y) satisfying 
the mapping conditions (bl) to (b9) , the minimum value of weight , 
which is required for the mapping, is defined as the distance 
D(f a (x), ^(y)) . Among all mappings from ? a (x) to T b (y) satisfying 

20 the mapping conditions (bl) to (b9) , the minimum value of weight , 
which is required for the mapping, is defined as the distance 
D(f a (x), ^(y)) . Among all mappings from T a (x) to T b (y) satisfying 
the mapping conditions (bl) to (b9) , the minimum value of weight , 
which is required for the mapping, is defined as the distance 

25 £)(T a (x), ^(y)). Similarly, among all mappings from a forest F a (x) 
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toaforest F b {y) satisfying the mapping conditions (bl ) to (b9) , 
the minimum value of weight, which is required for the mapping, 
is defined as the distance d{f s (x), F b (y)) . 

The method for obtaining the distance between R trees 
5 first allots numbers to vertexes and edges from the root of 
the R tree by way of depth priority searching. Compute the 
distance between the smallest subtrees (consist of one vertex) , 
firstly, and then using the above results, compute the distance 
between larger subtrees and finally we can get the distance 

10 between the two R trees. 

Also, it is assumed that a distance between ordered forests 
£>(F a (x), F b (y)) , distances between all subtrees 

vfoM T b (y)\ D(f a (xJ f T b (y)), D$ a (x A \ T b (y)), ^(xj, T b (y)) , 
D(r a (x), T b ( Yj )), D(f a (x), ^(y,)), D(T a (x), T b (y.)), D(r a (x), f b (y.)) have 

15 already been obtained. Definitions of the vertex substitution 
weight S(x, y) , the vertex insertion weight q(y) , the vertex 
deletion weight -r(x), the edge insert ion weight S(x, y) , the edge 
insertion weight g(y) , and the edge deletion weight r(x) are the 
same as those for RO trees. 

20 A distance £>(r a (x), T b (y)) between two subtrees T a (x) and 

T *(y)' which are R trees, shown in Fig. 6A can be calculated 
using the formula 1. A distance D(f a (x), T b (y)) between two 
subtrees f a (x) and T b (y) , which are R trees, shown in Fig. 6B 
can be calculated using the formula 2. A distance £(? a (x), T b (y)) 

25 between two subtrees f a (x) and T b (y) , which are R trees, shown 
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in Fig. 6C can be calculated using the formula 3. A distance 
D(r a (x), T b (y)) between two subtrees T a (x) and T h (y) , which are R 
trees, shown in Fig. 6D can be calculated using the formula 
4. 

5 A distance d(f s (x), F b (y)) between two unordered forest 

shown in Fig. 7, can be calculated using a formula 14. 

• • • (14) 

Where, symbol w(M ma2l ) shown in the formula 14 is a maximum 
10 weight matching of a weighted bipartite graph G (X, Y, E) as 
shown in Fig. 8. 

Also, a weight w{e{x lf y^)) of an edge e(x i7 y^) between the 
vertex x ± (g x) and the vertex y^e y) of the weighted bipartite 
graph g{x, Y, E) are set in accordance with the following formula 
15 15. 

• • • (15) 

It should be understood that the vertex x a (g x) of the 
weighted bipartite graph g(x, Y, e) represents a subtree 
20 ? a ( x i)( x i G Ch(x)), which constitutes the unordered forest F a (x) . 
The vertex y^e y) of the weighted bipartite graph g(x, Y, e) 
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indicates a subtree f a (y 4 ) {y, e Ch(y)), which constitutes the 
unordered forest F b (y) . 

[Calculation procedure of the distance between RO trees] 

Next, a procedure for converting text sentences Si and 
5 s 2 into RO trees to obtain a distance between the text sentences 
Si and S 2 will be described with reference to a flow chart shown 
in Fig. 12. 

The input two text sentences Si and S 2 are converted into 
RO trees T a and T b by using the morphological analysis section 

10 2, the semantic analysis section 3, and the tree structure 
conversion section 4 (SOI). As shown in Fig. 4, the word 
information are allotted to vertexes of the trees T a and T b and 
the case information are allotted to edges of trees T a and T b . 
Numbers from 1 to n are allotted to vertexes and edges of the 

15 RO trees T a and T b (n denotes a positive integer) . The numbers 
are allotted in the depth first order from the root of the RO 
tree (S02) . 

Next, x is set nl and y is set n2 (nl and n2 are number 
of the vertexes of the tree T a and number of the vertexes of 

20 the tree T b , respectively) (S03 and S04) . The distance 

d(f 3 (x), F b (y)) between an ordered forest F a (x)and an ordered forest 
F b (y) are calculated using the formula 5 (S05) . Incidentally, 
when distances between trees, between subtrees, and between 
forests are calculated, the distance calculation section 7 

25 obtains the vertex substitution weight S(x, y) , the vertex 
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deletion weight g(y) , the vertex insertion weight r(x),theedge 
substitution weight S(x, y) , the edge deletion weight g(y), and 
the edge insertion weight r(x) from the word-mapping-weight 
calculation section 5 and the case-mapping-weight calculation 
section 6 to calculate the distance. 

Subsequently, it is judged as to whether or not a subtree 
T a (x) (or f a (x)) is a subtree consisting of one vertex (S06) . If 
yes, the process proceeds to S08. On the contrary, if no, the 
process proceeds to S07 . 

in the S07, it is judged as to whether or not a subtree 
T b (y) (or r t (y)) is a subtree consisting of one vertex. If yes, 
the process proceeds toS09. On the contrary , if no, theprocess 

proceeds to S10. 

In the SOB, first it is judged as to whether or not the 
vertex y is the root of the tree T b . Then, when the vertex y 
is not the root of the tree T b , the distance calculation section 
7 calculates the distances between subtrees in the RO trees 
D(T a (x), T b (y)) , D(f a (x), T b (y)) , D(f a (x), T b (y)) , d(t 3 (x> T b (y)) using the 
formulae (6), (8), (10), and (12), respectively. On the 
contrary, when the vertex y is the root of the tree T b , the 
distance calculation section 7 calculates the distances between 
subtrees in the RO trees D(r a (x), T b (y)) and D (f a (x)> T b (y)) , using the 
formulae (6) and (10) , respectively. After the distance 
calculation section 7 calculates each distance, the process 
proceeds to Sll. 
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In the S09, first it is judged as to whether or not the 
vertex x is the root of the tree T a . Then, when the vertex x 
is not the root of the tree T a , the distance calculation section 
7 calculates the distances between subtrees in the RO trees 
D{TM\ T b (y)) > D{f a (x) f T b (y)), d(t 3 (x), T b (y)), D(r a (x), T b (y)) using the 
formulae (7), (9), (11), and (13) , respectively. On the 
contrary, when the vertex x is the root of the tree T a , the 
distance calculation section 7 calculates the distances between 
subtrees in the RO trees D(r a (x), T b (y)) and d(t 3 (x), T b (y)) , using the 
formulae (7) and (13) , respectively. After the distance 
calculation section 7 calculates each distance, the process 

proceeds to Sll. 

In the S10, first it is judged as to whether or not the 
vertexes x and y are the roots of the trees T a and T b , 

15 respectively. Then, when both the vertexes x and y are not 
the roots of the trees T a and T b , the distance calculation 
section 7 calculates the distances between subtrees in the RO 
trees d(t 3 (x), T b (y)) , d(t 3 (x), f b (y)) , D(f a (x), T b (y)) , D(r a (x), T b (y)) using 
the formulae (1) to (4) , respectively. When the vertex x is 

20 the root of the tree T a and the vertex y is not the root of 
the tree T b , the distance calculation section 7 calculates the 
distances between subtrees in the RO trees d(T 3 (x), T b (y)) and 
d(t 3 (x), f b (y)), using the formulae (1) and (4) , respectively. 
When the vertex x is not the root of the tree T a and the vertex 

25 y is the root of the tree T b , the distance calculation section 
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7 calculates the distances between subtrees in the RO trees 
D(r a (x), T„(y)) and D&(x> T h (y)) ' using the formulae (1) and (3) , 
respectively. After the distance calculation section 7 
calculates each distance, the process proceeds to Sll. 

Next, the distance calculation section 7 determines 
whether or not y = 1, that is, whether or not the vertex y is 
the root of the tree T b (Sll) . When y * 1 (No at Sll), y is 
decremented by one (S12) . Then, the process returns to S05. 
When y = 1 (Yes at Sll), the distance calculation section 7 
determines whether or not x =1, that is, whether or not the 
vertex x is the root of the tree T a (S13) . When x * 1 (No at 
S13), x is decremented by one (S14) . Then, the process returns 
to S04. When x = 1 (Yes at S13) , this means that distances 
between all subtrees including the trees T a and T b are 
calculated. In other words, the distance D(T a Q), T M has 
already been obtained. Therefore, the distance calculation 
section 7 outputs the distance D(T a (l), T b (\)) to the semantic 
content comparison section 8 through the memory 18. The 
semantic content comparison section 8 obtains a distance between 
the text sentences S 1 and S 2 on the basis of the input distance 
D(T a (\), T b (l)) and the formulae 16 and 17 (S15) . 
[Calculation procedure of the distance between R trees] 

Next, a procedure for converting text sentences Si and 
S 2 into R trees to obtain a distance between the text sentences 
Sl and S 2 will be described with reference to a flow chart shown 
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in Fig. 13. 

The input two text sentences Si and S 2 are converted into 
R trees T a and T b by using the morphological analysis section 
2, the semantic analysis section 3, and the tree structure 
conversion section 4 (S21). As shown in Fig. 4, the word 
information are allotted to vertexes of the trees T a and T b and 
the case information are allotted to edges of trees T a and T b . 
Numbers from 1 to n are allotted to vertexes and edges of trees 
T a and T b (n denotes a positive integer) . The numbers are 
allotted in the depth first order from the root of the R tree 
(S22) . 

Next, x is set nl and y is set n2 (nl and n2 are number 
f the vertexes of the tree T a and number of the vertexes of 
the tree T b , respectively) (S23 and S24) . The distance 
D(F a (x), F b (y)) between an unordered forest F a (x)and an unordered 
forest F b (y) are calculated using the formula 14 (S25) . 
Incidentally, when distances between trees, between subtrees, 
and between forests are calculated, the distance calculation 
section 7 obtains the vertex substitution weight S(x, y) , the 
vertex deletion weight g(y), the vertex insertion weight r(x), 
the edge substitution weight S{x, y) > the edge deletion weight 
q(y), and the edge insertion weight r(x) from the 
word-mapping-weight calculation section 5 and the 
case-mapping-weight calculation section 6 to calculate the 
distance . 
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Subsequently, it is judged as to whether or not a subtree 
T a (x) (or f a (x)) is a subtree consisting of one vertex (S26) . If 
yes, the process proceeds to S28. On the contrary, if no, the 
process proceeds to S27. 

In the S27, it is judged as to whether or not a subtree 
T b (y) (or T b (y)) is a subtree consisting of one vertex. If yes, 
the process proceeds to S29. On the contrary, if no, the process 

proceeds to S30. 

In the S28, first it is judged as to whether or not the 
vertex y is the root of the tree T b . Then, when the vertex y 
is not the root of the tree T b , the distance calculation section 
7 calculates the distances between subtrees in the R trees 
D(T a (x), T b (y)), D(f a (x), T b (y)) t d(t 3 (x), T b (y)), D(r a (x> T b (y)) using the 
formulae (6), (8), (10), and (12), respectively. On the 
contrary, when the vertex y is the root of the tree T b , the 
distance calculation section 7 calculates the distances between 
subtrees in the R trees d(T 3 (x), T b (y)) and D(f a (x), T b (y)) , using the 
formulae (6) and (10), respectively. After the distance 
calculation section 7 calculates each distance, the process 

proceeds to S31. 

In the S29, first it is judged as to whether or not the 
vertex x is the root of the tree T a . Then, when the vertex x 
is not the root of the tree T a , the distance calculation section 
7 calculates the distances between subtrees in the R trees 
0(T a (x), T b (y)) , D(T a (x\ T b (y)). D(f a (x), T b (y)) t flfc(x> T b (y)) using the 
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formulae (7), (9), (11), and (13), respectively. On the 
contrary, when the vertex x is the root of the tree T a , the 
distance calculation section 7 calculates the distances between 
subtrees in the R trees D(T a (x), T„(y)) and d(t 3 (x), ?»), using the 
formulae (7) and (13), respectively. After the distance 
calculation section 7 calculates each distance, the process 

proceeds to S31. 

In the S30, first it is judged as to whether or not the 
vertexes x and y are the roots of the trees T a and T b , 
respectively. Then, when both the vertexes x and y are not 
the roots of the trees T a and T b , the distance calculation 
section 7 calculates the distances between subtrees in the R 
trees d(t 3 (x), T b (y)) , D(f a (x), f b (y)), D(f a (x), T b (y)) , d(t 3 (x), T b (y)) using 
the formulae (1) to (4), respectively. When the vertex x is 
the root of the tree T a and the vertex y is not the root of 
the tree T b , the distance calculation section 7 calculates the 
distances between subtrees in the R trees s(T a (x), T b (y)) and 
D(T a (x)> T b (y))> using the formulae (1) and (4) , respectively. When 
the vertex x is not the root of the tree T a and the vertex y 
is the root of the tree T b , the distance calculation section 
7 calculates the distances between subtrees in the R trees 
D(T a (x), T b (y)) and D(f a (x), T b (y)) , using the formulae (1) and (3), 
respectively. After the distance calculation section 7 
calculates each distance, the process proceeds to S31. 

Next, the distance calculation section 7 determines 
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whether or not y = 1, that is, whether or not the vertex y is 
the root of the tree T b (S31) . When y * 1 (No at S31) , y is 
decremented by one (S32) . Then, the process returns to S25. 
When y = 1 (Yes at Sll) , the distance calculation section 7 
determines whether or not x =1, that is, whether or not the 
vertex x is the root of the tree T a (S33) . When xM (No at 
S33), x is decremented by one (S34). Then, the process returns 
to S24. When x = 1 (Yes at S33) , this means that distances 
between all trees including the trees T a and T b are calculated, 
in other words, the distance D(r a (l), T b {l)) has already been 
obtained. Therefore, the distance calculation section 7 
outputs the distance d(tM T b (\)) to the semantic content 
comparison section 8 through the memory 18. The semantic 
content comparison section 8 obtains a distance between the 
text sentences Si and S 2 on the basis of the input distance 
D(T a (l), T b (\)) and the formulae 16 and 17 (S35) . 

The distance D(T a , T b ) = D (T a (x=l) , T b (y=l) ) between either 
the RO trees or the R trees can be obtained by using the 
above-explained methods. 

Next, the semantic content comparison section 9 obtains 
a distance between text sentences by using formula 16 or formula 
17. 

A symbol "D(S X , S 2 ) " indicates a distance between a 
sentence "Sx" and a sentence "S 2 ", symbol "T/' represents a tree 
structure (either RO tree or R tree) of the sentence "Si", and 
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symbol "T 2 " shows a tree structure (either RO tree or R tree) 
of the sentence "S 2 ", and symbol "D (Ti, T,) " indicates a distance 
between the tree Ti and the tree T 2 . 



d[S>,S2) = d(T>,T2) 



(16) 



v ' |r.| + |r-' 



(17) 



[Example] 

Next, a description will be given on an operation of the 
apparatus and the method for comparing the semantic contents 
of the text sentences according to the embodiment of the present 
invention, using a specific example. 

A process and a result of obtaining the similarity (or 
the difference) between a sentence A "my wife, Hanako, has a 
cold" and a sentence B "my wife has a cold" will be give, using 
the apparatus for comparing the semantic contents of the text 
sentences according to the embodiment of the invention. In 
this example, the word deletion weight, the word insertion weight , 
the case deletion weight, and the case insertion weight are 
set to 70. The word substitution weight is set to 100, and 
also, the case substitution weight is set to 100. 
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First, both the sentence A and the sentence B are 
morphologically analyzed. Then, the syntax and semantic 
analysis are performed with respect to the sentences A and B. 
As a result, these two sentences A and B are converted into, 
5 for instance, an RO tree T R and an RO tree T B shown in Fig. 9A 
and Fig. 9B, respectively. 

Next, the distance between the two converted RO trees 
is calculated using the formula 1. Finally, the distance 
between the two text sentences A and B is calculated using either 
10 the formula 16 or the formula 17. 

When the formula 16 is used, the distance between the 
text sentence A and the text sentence B becomes d(A, b) = 140 . 
When the formula 17 is used, the distance between the text 
sentence A and the text sentence B becomes d{a, B) = 20(= 140 / 7). 
15 in this case, the distance between the two RO trees T A and T B 
is D{T A ,T B ) = 140, and total number of vertexes of the two RO 
trees T A and T B is equal to 7. 

Fig. 10 shows one mapping between RO trees for giving 
the distance d(t a , T B ) . As shown in this drawing, a distance 
20 between the two RO trees T fi and T B becomes equal to a sum of 
a deletion weight of 70, which is required for deleting the 
word -Hanako" and a deletion weight of 70, which is required 
for deleting the case "ADJUNCT" . 

Accordingly, in the text sentence comparing apparatus 
25 and the text sentence comparing method according to the invention, 
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text sentences are morphologically analyzed and semantically 
analyzed. Then, the sentence structure and meaning of the 
entire analyzed text sentences converted into either RO trees 
or R trees on the graph theory . That is, the sentence structure 
and meaning of the entire text sentences are converted into 
either the RO trees or the R trees. The word information 
(including the attributes of the words) and the dependency 
relation information (case information) between words 
appearing in the text sentences are stored in vertexes and edges 
of either the RO trees or the R trees, respectively. Adistance 
between either the RO trees or the R trees, which is based on 
a correspondence relationship between the vertexes and edges, 
is applied to a distance measuring differences in semantic 
contents between the text sentences. The differences in 
semantic contents between the text sentences are compared by 
using the distance between either the RO trees or the R trees. 
Thereby, the semantic contents between the input two text 
sentences can be obtained with high precision and in a real 
time . 

Specifically, in the invention, the distance between the 
text sentences is defined based on the difference in the word 
information between the text sentences, the difference in the 
case information, and the difference in the entire constructions 
between the text sentences . Therefore, the distance functions 
according to the inventionhave the following three goodnatures . 
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That is, (1) a distance between two text sentences, whose 
meanings are similar to each other and whose constructions are 
similar to each other is obtained as a small value; (2) a distance 
between two text sentences whose meanings are different from 
each other and whose constructions are not similar to each other 
is obtained as a very large value; and (3) a distance between 
two text sentences whose meanings are different from each other, 
but whose constructions are similar to each other is obtained 
based upon either a difference in word information or both the 
difference in word information and a difference in case 
information. As a result, the distance between the two text 
sentences can be calculated in high precision. 

Also, in this example, as to the RO tree, the distance 
between the two text sentences can be calculated on the order 
of n 2 (namely, squared total number "n" of vertexes of an RO 
tree, i.e. "0(n 2 )"). As to the R tree, the distance between 
the two text sentences can be calculated on the order of n 2 
and "m" (namely, squared total number w n" of vertexes of R tree 
and maximum number x, m" of children, i.e., "0(mn 2 )"). 
Accordingly, the distance between the two text sentences can 
be calculated in real time. 

It should also be noted that as the arrangement of the 
text sentence comparing apparatus of the present invention, 
the present invention is not limited only to the above-explained 
arrangements, but may be realized by employing various other 
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arrangements . Alternatively, the inventive idea of the present 
invention may be provided in the form of, for example, a program 
capable of realizing the comparing method according to the 
present invention. 

Also, as the application field of the present invention, 
the present invention is not limited only to the above-described 
application fields, but may be applied to other various technical 
fields . 

Alternatively, as the various sorts of process operations 
executed in the present invention, such an arrangement may be 
employed in which, for example, a processor executes a control 
program stored in a ROM (Read-Only Memory ) in a hardware resource 
equipped with the processor and a memory. Also, the respective 
function means for executing this process operation may be 
arranged as independent hardware circuits. 

Alternatively, the present invention may be grasped as 
a computer readable recording medium and a relevant program 
itself, while the computer readable recordingmedium is realized 
as a CD (Compact Disc) -ROM and a floppy (registered trademark) 
disk which has previously stored thereinto the above-explained 
control program. Thus, since this control program is entered 
from the recording medium to the computer so as to be executed 
by theprocessor, the process operations according to the present 
invention may be executed. 

As previously explained in detail, in accordance with 
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the text sentence comparing apparatus and the text sentence 
comparing method related to the invention, the entire 
constructions and the meanings of the text sentences are 
expressed by either the RO trees or the R trees in the graphic 
5 theory, and the differences in the semantic content between 
the text sentences are compared with each other by using either 
the distances between the RO trees based on the correspondence 
relationships among the vertexes and the edges or the distances 
between the R trees based on the correspondence relationships 
10 among the vertexes and the edges. As a result, the semantic 
content between the two inputted text sentences can be grasped 
in high precision and in real time. In accordance with the 
invention, for instance, not only the semantic content of the 
documents can be compared with each other and the documents 
15 can be classified based on the semantic content, but also the 
' information searching intention by the user can be understood. 
Inotherwords, since the request of the user which is represented 
in the natural language is compared with the storage content 
of the database which has been constructed by way of the previous 
20 learning, the information searching intention of the user can 
be predicted. 

In the embodiment of the invention, the description has 
been given on the English text sentences . It goes without saying 
that the invention can be applied to any natural languages such 
25 as Japanese, Chinese, French, and German. 
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